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Abstract 

The limitation of permutation tests is that they assume exchangeability. 
It is shown that in generalized linear models one can construct permutation 
tests from score statistics in particular cases. When under the null hypothesis 
the observations are not exchangeable, a representation in terms of Cox-Snell 
residuals allows to develop an approach based on an expected permutation 
p- value (Eppv); this is applied to the logistic regression model. A small 
simulation stydy and an illustration with real data are given. 

Resume 

La limitation des tests de permutation est qu'ils sont bases sur une 
hypothese d'echangeabilite. II est montre que dans les modeles lineaires 



generalises on pent construire des tests de permutation par la statistique 
du score dans des cas particuliers. Quand les observations ne sont pas 
echangeables sous I'liypothese nulle, une representation en terme de residus 
de Cox-Snell permet de developper une approche basee sur I'esperance de la 
p-valeur de permutation; ceci est applique au modele de regression logistique. 

Keywords: Exchangeability, Permutation tests, Residuals, Score Test, 
Logistic regression, p-values. 

Version frangaise abregee 

Considerons une statistique T(Y) pour tester une hypothese Hq. La 
decision de rejet de Hq est prise si T(Y) > Ca, Ca choisi tel que I'erreur 
de type I est a. La p-valeur est definie comme une variable aleatoire par: 

pt;[T(F)] =E{lT(Y.)>T{Y)|fx(Y)} 

oil Y* est une variable independante de Y mais de meme distribution. Les 
tests de permutation sont bases sur un conditionnement sur les statistiques 
d'ordre : Y(^o) = Y(^i),..., Y(^n)- 

La p-valeur de permutation est: 

ppv[TiY)] = E{It(y.)>t(y)|o^(Y) V a(Y^„) = Y(o))} 

Supposons que nous puissions representer Y par Y = g{e) avec e ecliangeable. 
Une telle representation a ete propose par Cox et Snell Alors T{Y) = 
T[g{e)] = S{e). Si e etait observe on pourrait utiliser la p-valeur de permu- 
tation : 



2 



En general e n'est pas observe. Nous proposons done de prendre I'esperance: 

L'esperance pent dependre de parametres de nuisance 7 G F. Dans ce cas on 
pent soit les remplacer par les estimateurs du maximum de vraisemblance, 
soit calculer max^gr -Eppf (7). Cette approche est adaptee a un modele de 
regression logistique. 

1 Introduction 

Permutations tests can be useful as distribution-free tests and also have ex- 
act size (as opposed to the asymptotic validity of most conventional tests). 
However the use of permutation tests in regression problems has been limited 
because valid permutation tests obtain only if the observations are exchange- 
able under the null hypothesis. A vector Y has an exchangeable distribution 
if PY has the same distribution as Y, for any permutation matrix P. If 
we consider a test statistic T{Y), a permutation test is obtained, if Y is ex- 
changeable, by conditioning on the order statistics F(o) = {^(i), • • • , ^(n)} [S]. 
The assumption of exchangeability, although a little less stringent than the 
assumption of identically independently distributed (i.i.d.) observations, is 
still quite restrictive, and does not hold for instance in regression problems. 

The has been many applications of permutation tests; a particularly in- 
teresting permutation test was proposed by Mantel P|. Permutation tests 
are often based on score tests. For some theory about permutation tests see 
[1] and for score tests see [2] and [1]. 
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In this paper we propose a new approach, called expected permutation 
p- value (Eppv), based on permuting an unobserved exchangeable variable. 
Section 2 presents permutation versions of score tests in generalized linear 
models. In sectiin 3 some theory about p-values, permutation and condi- 
tioning is developed and the Eppv are presented. This approach is then 
apphed to the logistic regression model in section 4. Section 5 presents a 
short simulation. An illustration with real data is given ins ectiion 6 which 
concludes. 

2 Permutation score tests 

Consider a sample of independent random variables Yi, i — l,...,n, and 
assume a generalized linear model; the contribution of observation i to the 
likelihood is: 

fiYi, Oi, V,) = exp {ry-i [^,y, - + c(Yi, rj) } 

with E{Yi) — b'{9i) — jJLi and 9i — Z'^^ where Z'^ — {z\, . . . , z^) is a row vector 
of explanatory variables (considered here as deterministic) and ^ is a p x 1 
vector of regression coefficients; here r] denotes the dispersion parameter. 
Then the score equation obtained by equating to zero the derivative of the 
loglikelihood L relatively to (3 is Z^R = 0, where Z is the n x p matrix of 
explanatory variables Zj , and R — {Ri, . . . , R-n)^ is the vector of residuals 
Ri — Yi — ijLi0) . Thus the estimated residuals are orthogonal to the space of 
explanatory variables. 

If we consider an explanatory variable indexed by p+l, the model becomes 
9i — Z^(3 + Zp_^i(3p+i. Lets us denote the parameters 7 = (77, /3, /3p+i). The 
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score statistic for testing Hq: "/3p+i = 0" has the hnear form: 

dL 

S{Y) = = 0) = z^^.R, (1) 

where Zp^-^ = (2^+1, . . . , Zp^i) is the vector of values for explanatory variable 
p+1 and R is the vector of residuals in the model not including variable 

A test for Hq: "/3p+i = 0" may be based on the asymptotic distribution 
of n~^^'^S(Y). Let us call (/){¥) the critical function of the test (0(V) = 1: 
Hq rejected, <p(X) = 0: Hq not rejected); except in simple cases it is not 
possible to construct exact tests, that is with E^[0(F)] = a, 7 G w, where u 
is the subset of the parameter space corresponding to Hq. For small sample 
sizes the difference between the nominal and true Type I error rates may 
be large. In regression models it is tempting to try to construct tests based 
on permutation of the residuals in the score statistics [lOj. Fisher exact 
test can be shown to be a permutation of the residuals in a score test, in 
a case where the observations are exchangeable under the null hypothesis. 
However, generally as soon as there is one explanatory variable under the 
null hypothesis, neither Y nor R are exchangeable; hence, permutation tests 
cannot be constructed |T]. 
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3 Some theory about p-values, permutation 
and conditioning 

3.1 p-values 

Consider a test (piY) based on a statistic T{Y). We examine the case where 
the decision to reject Hq is taken if T(Y) > Ca, being chosen such 
E^[0(y)] = a. A definition of the p- value which allows to consider it as 
a random variable (and hence to study its properties) is 

p.;[T(r)]=E,[lT(Y-)>T(Y)|cr(Y)] 

where Y* is a random variable independent from Y but with the same 
distribution and (j(Y) is the sigma-algebra generated by Y . See |Tl] for 
properties of the conditional expectation. We can construct a size a test by 
rejecting Hq if pv[T{Y)] < a, that is: (piY) = Ipv[T{Y)]<a- 

3.2 Conditional p-values 

We may define a p-value conditional on C, where C C cr{Y, Y*) as: 

pVc[T{Y)] = E^[lT(Y*)>T(Y)|t^(Y) VC}. 

Conditional tests can be constructed as 0(F) = Ipvc[T{Y)]<a- We have [0(F) |C] = 
a; it follows that we also have E^[0(F)] = a. That is, marginally the test has 
size a, but the critical regions (and the power) depend on C. The conditional 
approach has been advocated for two different situations [7]. 
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The first arises if we have a sufficient statistic C for the family of measure 
= {P-y, 7 e cu}, where u — HOK, the frontier between the sets represent- 
ing the null (H) and the alternative (K) hypotheses. If C is the sigma-algebra 
generated by C, then pvc[T{Y)] no longer depends on 7, so that we obtain 
a similar test, E^[0(y)] — a, ^ E u. Such a test is said to have the Neyman 
structure relatively to C. As an example consider the case where we ob- 
serve variables Yi, i = 1, . . . ,n which are i.i.d. under the family of measures 
= {P^, 7 G u}. Then the order statistic. Y(o) = 1^(1), . . . , ^(n) is sufficient 
for 7 and if we take C = a{{Y^g^ — ^(o)}) we obtain a permutation test, that 
is we have E[0(y)|Y(o)] = a. Due to the discrete character of the conditional 
distribution of T{Y), it is not possible to achieve E[0(y)| Y(o)] = a for all 
a, except by resorting to randomisation; we will neglect this problem in the 
sequel. 

The second situation arises in the presence of ancillary statistics Z: here 
the motivation is to perform the test adapted to the situation fixed by the 
particular realization of Z. We may also consider S-ancillary statistics whose 
distribution depends on an unknown parameter ^, while the distribution of 
Y given Z does not depend on ^. While the unconditional p-valuc depends 
on both 7 and ^, the p- value conditional on Z does not depend on ^. As an 
example consider the case of a regression model where explanatory variables 
Z^ are associated to response variables Yf. the regression model specifies the 
conditional distribution of Yi given and depend on 7, while the marginal 
distribution depends on ^ only. It is natural to consider tests which are 
conditional on Z; in our formalism, for a test stastic T{Y, Z) we then compute 
the conditional p- value pvc[T{Y, Z)\ with C = a{{Z* = Z}). 
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The two situations have in common the fact that there is a reduction of 
the number of parameters on which the p-value depends. In the particular 
case where there is a sufficient statistic for 7, the p-value does not depend 
on any parameter. However in complex problems this may not be achieved 
without loosing too much power. One possibility is to replace pvc[T(Y);'y] 
by pvc[T{Y);'y], where 7 is an estimator of 7. We would like to a have 
a procedure such that \pvc[T(Y);^] — pvc[T(Y);'y]\ is as small as possible. 
Choosing large C may help to reduce the variance of this random variable. 
Another way is to apply a minimax argument. If it is known that 7 belongs 
to a compact set F, then we may base a test on max^fzr pvc[T{Y)] j]. This 
leads to a test of size lower or equal to a. 

3.3 The expected conditional p-value 

Consider the case where Y = g{s), where g{.) is a non-decreasing func- 
tion; if g is not one-to-one we have ffiY) C cr(£:). If we have a statistic 
T{Y), this defines a statistic S{e) = T{g(Y)). We may consider the p-value 
pvc[S{e)] = E^[Is(e*)>s(e)|cr(e) V C}, where C C (7{e*,e). Since in general 
this is not cr(y)-measurable, we may consider its expectation Epvc[S{e)] = 
E-y[pvc[S(£:)]|o"(Y)]. A size-a test can be constructed using this expected 
conditional p-value as usual. 

This approach can in particular be connected with the Cox-Snell family 
which represents Y as Y = g{e), where e is exchangeable. Such a repre- 
sentation was proposed by Cox and Snell [3j to define residuals. If e were 
observed a permutation test could be constructed by conditioning on the 
order statistic of £. It is appealing thus to use an expected conditional p- 
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value choosing C = '^{^lo) ~ ^(o))- Such a p- value will be called expected 
permutation p- value (Eppv). 

Numerically this method is easy to implement: draw at random e* from 
the distribution of e conditional on Y; compute the permutation p-value; 
take the mean of the p-values for a sufficient number of drawings. However 
the distribution of e conditional on Y may depend on parameters that may 
have to be estimated (see sections 3.2 and 4). 

4 Applications of the Eppv approach to the 
logistic model 

A logistic regression model is specified by: Pr(Fj = 1) = VTj; logit(7rj) = z^f3. 
It can be depicted in terms of latent i.i.d. variables Ei having a uniform 
distribution on [0,1]: 

V- — T 

A score test for Hq : = 0" is T{Y) = S{e) = zj+i(/e<^ - vr) with 

obvious vectorial notation. For a permutation test only the first part zl^^^IeKn 
is needed. However, because J2i hi<-Ki is not constant under permutation 
of e, the test is not invariant for a change of origin of z: there is a need 
to center one of the two vectors involved in this scalar product, a concept 
also related to that of "clean" form as in [T]. Thus the proposed statistic 
is T{Y) = S{e) = Ei4+i(^e,<^, -^"^Ei4<^J = -^p+i)4<7rj 
(where Zp^i is the mean of -Zp+i), which is invariant. 

For computing the Eppv we draw e from its conditional distribution which 

is 
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• ~ UilTi, 1] if Yi = 0. 

If the TTj are known, an exact permutation test follows. In practice one 
may replace tTj by an estimator TTj, the maximum likelihood estimator of tTj 
under Hq, leading to an approximate test. It is conjectured that the type 
I error probability is a + Op{n~^^^), similar as when using the asymptotic 
distribution of the standardized score statistic. However for small sample 
size the Eppv approach may have better performance because of the non- 
standard conditioning. Another possibility is to apply the minimax approach. 
Consider the case p — 1 and it is known that f3i e [a,b]. One can find 
max^jg[a6] Eppv{Pi) and this leads to a test with type I error probability 
lower or equal to a. In practice the maximum can be found numerically. 

It is interesting to note that when there is no explanatory variable under 
the null hypothesis, the Eppv test reduces to Fischer's exact test; this hap- 
pens because for all i, iti — Y so that permuting e is identical to permuting 
Y. 

5 Simulation study 

We have simulated a Logistic regression model given by: 

logit{T^i) = /3o + (3izl + ySz^ 

with /3q = 0] /3i = 1; z\ = w\ — 1; = — l){z\Y, where wl and are 
independent with exponential distributions. The values d = 0, where z\ and 
^2 were independent, and d = 1 and d — —1, producing two different cases 
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of non-linear dependencies between z\ and were tried. Samples of sizes 
30 and 15 were generated from this model. The problem was: testing Hq : 
"/32 = 0" at size a = 0.05. The empirical sizes (for P2 = 0) and powers (for 
/32 = 1 for n = 30 and = 2 for n = 15) of the likelihood ratio (LR) test, 
the Wald test, a score test based on permutation of residuals (PR) and the 
Eppv test have been estimated by simulation using 10000 replicates. We have 
also tried a Bootstrap test: among several possibilities we have chosen the 
one which seemed the most natural that is a non-parametric bootstrap of the 
Wald test; the guidelines given in [5], that is resampling {f32 — [^2) / cf^^ (where 
P2 is the maximum likelihood estimate of P2 for a resample and a^^ is the 
estimated standard deviation of /3|), have been applied; this time-consuming 
test (using 499 resamples) has been studied on only 1000 replicates. For 
simplicity, for all the tests, only marginal probabilities were estimated, that 
is we regenerated the z{ and Z2 at each replicate. 

The results appear in Table 1 (with P2 simply denoted /3). It is clear that 
the Wald test tends to be conservative while the LR test tends to be anti- 
conservative. These behaviours are more marked for n = 15 than for n = 30. 
The tests based on permutation better respect the size of the tests with a 
tendency to conservative for d = 1; the Eppv test has a better stability than 
permutation of residuals. The bootstrap Wald test is not really practical 
for n = 15 because many configurations generated by resampling are too 
particular and lead to failure of convergence of the algorithm; so the results 
of this test are not displayed in Table 1. For = 30 it is strongly anti- 
conservative: the estimated type I error risks are 0.088, 0.097, 0.14 for d = 0, 
1 and —1 respectively. 
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The power of the Eppv test is always higher than that of the Wald test 
and of the test based on permutation of residuals; it is sometimes lower than 
that of the likelihood ratio test but the latter is not very reliable in the 
situations considered. In conclusion when working with small samples and 
when we can suspect a dependency between the factor studied and the other 
explanatory variables, the Eppv test seems the most reliable among the tests 
considered here. 

6 Illustration on real data 

Even in a large study very small numbers may occur in some categories of 
the sample which are of interest. The small problem treated here for illustra- 
tion is taken from a real study on the effect of wine consumption of the risk 
of developing dementia [9J. In this study, 2273 non-demented subjects were 
followed up during three years. Subjects were classified according to their 
wine consumption as: no drinkers, mild drinkers moderate or heavy drinkers. 
During the follow-up 99 cases of dementia developed. Potentially important 
confounding factors were age, gender and educational level (here coded as a 
binary variable: no primary diploma vs primary diploma or above). Globally 
it appeared from a logistic regression analysis that moderate wine consump- 
tion was a protective factor against dementia. However if we try to analyze 
the data separately by gender (which is legitimate because both the curse 
of dementia and drinking habits are different among genders) very small 
numbers occur. In particular, there were 28 dementia cases among 811 non- 
drinking women and cases among 44 moderate or heavy drinking women. 
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Table 1: Simulation results based on 10000 replicates of a logistic regression 
model comparing the Wald test, the likehhood ratio test (LR), the test based 
on permutation of residuals (PR) and the Eppv test; the theoretical size of 
the tests is 0.05. 





Wald 


LR 


PR 


Eppv 


n = 30 
















d = 





0.044 


0.063 


0.051 


0.052 


^ = 


d = 


1 


0.025 


0.069 


0.015 


0.025 


(Type I error) 


d = 


-1 


0.020 


0.080 


0.062 


0.046 




d = 





0.45 


0.53 


0.47 


0.48 


/3 = 1 


d = 


1 


0.17 


0.31 


0.14 


0.17 


(Power) 




-1 


0.81 


0.91 


0.85 


0.88 


n = 15 
















d = 





0.020 


0.072 


0.049 


0.049 


/3 = 


d = 


1 


0.009 


0.094 


0.018 


0.020 


(Type I error) 


d^ 


-1 


0.007 


0.094 


0.066 


0.041 




d^ 





0.22 


0.52 


0.57 


0.58 


^ = 2 


d^ 


1 


0.10 


0.41 


0.19 


0.22 


(Power) 


d = 


-1 


0.16 


0.65 


0.75 


0.79 
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With such figures, a logistic regression with wine consumption as an explana- 
tory variable fails to converge so that it is not possible to use a Wald test 
and a UkeUhood ratio test is probably not very reUable. For one-sided alter- 
native, Fisher's exact test gave a p-value equal to 0.21 and when adjusting 
on age and educational level, we obtained p- values equal to 0.18 and 0.13 
with the PR and Eppv tests respectively; on the basis of these data, taking 
into account possible confounding factors, the hypothesis that consumption 
of wine has no effect on risk of dementia among women cannot be rejected. 

In conclusion the Eppv approach extends permutation tests ideas to com- 
plex problems. Bootstrap was also in part motivated by such an extension 
but unlike bootstrap, the Eppv approach keeps the idea of conditioning on 
the order statistic of an exchangeable vector. 
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